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1.  Introduction 


An  interpretation  of  a  news  event  from  a  single  source  can  result  in  an  erroneous  conclusion, 
especially  if  little  (or  nothing)  is  known  about  the  validity  of  that  source.  For  example,  one 
person’s  freedom-fighter  can  be  another’s  terrorist  (i).  Thus,  it  is  often  better  if  one  has  access 
to  a  broad  collection  of  data,  and  is  able  to  select  specific  reporting  media.  Software  for  the 
acquisition  of  real-time  news  published  on  the  Internet  has  been  developed  at  the  U.S.  Army 
Research  Laboratory  (ARL). 

The  maturation  of  the  semantic  Web  in  combination  with  a  Google,  Inc.  AJAX  search  using  its 
application  programming  interface  (API)  are  two  key  developments  that  are  exploited  in  ARL’s 
rapid  collection  of  current  world  event  data.  One  reason  for  developing  this  tool  is  that  news 
(both  local  and  distant)  on  a  particular  topic  and  subsequent  responses  can  propagate  quickly. 
Therefore,  an  attempt  is  made  to  provide  the  analyst  with  an  aggregation  of  information  on  a 
subject  as  it  becomes  publicly  available. 

Development  involved  consideration  of  all  six  Google  services  as  potential  data  sources:  Google 
News,  Web,  Maps,  Video,  Book,  and  Blog.  The  first  three  (News,  Web,  and  Maps)  are  being 
used  for  the  collection  of  data.  The  Google  News  site  (http://news.google.com)  uses  over 
10,584  sources  (see  http://blog.outer-court.com/googlenews/),  including  both  national  and 
international  nodes.  A  Google  Web  search  requires  one  to  designate  a  particular  uniform 
resource  locator  (URL),  while  an  even  more  specific  Maps  search  involves  the  actual 
specification  of  latitude  and  longitude  for  the  point  of  interest. 

Google  Blog,  Video,  and  Book  are  not  included  in  the  search  controller  at  this  time.  Google 
Book  is  not  pertinent  to  real-time  analysis.  In  a  Google  Video  search,  “hits”  result  in  images  or 
illustrations,  which  are  inconsistent  with  our  intention  in  gathering  news  and  not  interpreting  it. 
Likewise,  blogs  (i.e.,  Web  logs)  tend  to  be  opinionated  and  are  not  professionally  reported  (2). 

The  following  sections  provide  technical  details  of  information  acquisition.  The  graphical  user 
interface  (GUI),  which  is  illustrated  in  figure  1 ,  is  first  presented  and  described.  The  next  section 
gives  an  overview  of  asynchronous  JavaScript  and  XML  (Ajax),*  where  Google’s  API  is  used  to 
find  the  most  recent  news.  Section  4  examines  the  method  selected  for  scraping  of  data.  The 
conclusion  offers  future  considerations  for  enhanced,  capabilities. 

The  attached  appendices  include  the  actual  XHTML,  JavaScript,  and  Java  code  for  the  complete 
Web  application.  The  cascading  style  sheet  used  in  the  XHTML  can  be  obtained  from  Google 
(http://www.google.com/uds/css/gsearch.css). 


* 

Google,  Inc.  uses  the  term  AJAX  throughout  the  description  of  its  search  API.  The  authors  of  this  paper  prefer  Ajax,  which 
was  coined  by  the  originator  Jesse  James  Garret.  The  reason  is  that  our  approach  encompasses  technologies  that  is  consistent 
with  Ajax  design. 
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Figure  1.  Real-time  news  analysis  (RTNA)  GUI  example  with  a  Google  Maps  gadget.  The  satellite  view  of  the 
area  includes  a  marker,  which  is  a  location  within  Fallujah,  Iraq. 
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2.  RTNA  Graphical  User  Interface 


The  results  of  a  sample  search  using  the  Mozilla  Firefox  1 .5  browser  for  display  are  illustrated  in 
figure  1.  At  the  very  top  of  the  GUI  is  the  date  and  time  the  query  was  made.  This  is  computed 
in  a  static  JavaScript  node  of  the  XHTML  document;  static  JavaScript  is  run  automatically  when 
the  Web  browser  is  loading.  Each  instance  of  a  browser  exposes  many  JavaScript  objects, 
including  date,  in  determination  of  the  document  object  model  (DOM),  which  is  an  abstract 
interface  that  allows  for  manipulation  of  the  browser. 

The  date  object  uses  getMonth(),  getDate(),  and  getFullYear()  methods  when  constructing  the 
string  in  the  JavaScript  node.  Note  also  that  the  DOM  includes  events  and  its  handlers  (see 
Flanagan  (3)  for  a  thorough  discussion  of  the  JavaScript  interface  to  the  DOM  of  browsers). 

The  GUI  defines  a  search  box  that  handles  up  to  10  tokens,  where  a  token  is  defined  as  a  string 
of  characters  surrounded  by  white  space.  The  format  for  an  entry,  including  stop  words,  is 
described  in  the  book  by  Calishain  and  Domfest  ( 4 ).  When  the  user  is  satisfied  with  a  particular 
query,  a  search  is  started  by  pressing  “Search.”  This  button  is  a  text  field  of  the  XHTML 
<form>,  i.e.,  a  child  node  <input>  of  type  “button”  and  value  “Search.”  By  using  a  button  the 
DOM  of  the  browser  avoids  a  complete  page  reload,  resulting  in  much  better  response. 

The  results  of  the  request  are  then  available.  A  result  bar  (see  figures  2  and  3)  exists  for  each 
Google  service  (remember  that  only  Web,  News,  and  Maps  searchers  have  been  added  to  the 
controller).  In  this  example,  a  total  of  five  Google  Web  searches  and  a  Google  News  search  are 
displayed.  A  Google  Web  search  object  (GwebSearch())  is  necessary  for  each  site  selected;  here, 
URLs  for  ABC,  CBS,  NBC,  CNN,  and  Reuters  news  were  chosen.  Google  News  uses  an 
algorithm  for  a  much  more  vast  selection  of  sites  throughout  the  world. 


IB  -a 


Figure  2.  Enabled  state  for  1,  4,  and  8  (or  more)  search  results. 
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Figure  3.  Disabled  state  for  1,  4,  and  8  (or  more)  search  results. 
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There  is  also  the  option  for  a  chronological  sort  of  Google  News  or  setting  the  center  point  for  a 
Google  Maps,  or  Local,  search.  This  involves  invocation  of  the  appropriate  functions  for 
GnewsSearch  (setResultOrder()  method)  and  GlocalSearch  (setCenter()  method)  objects, 
respectively,  by  simply  clicking  the  icon  (figure  4)  and  confirming  the  selection. 

ft _ 

Figure  4.  Icon  defined  for  sorting  of  Google  News  by  date,  or  selection  of 
center  point  for  the  Google  Maps  service. 

Finally,  a  Google  gadget  for  a  Maps  mash-up  has  been  added  to  the  XHTML.  The  intent  is  a 
two-dimensional  (2-D)  visual  display  of  an  area  by  providing  the  latitude  and  longitude  of  the 
map  center  point.  An  accuracy  of  10  6  decimal  degrees  (e.g.,  designation  of  a  specific  building) 
is  possible  if  Google  data  exists  for  that  point.  A  search  of  the  Google  Local  database  is  possible 
using  its  Maps  API  (see  http://local.google.com),  but  here  we  only  provide  a  display.  An 
excellent  discussion  of  both  Google  Maps  and  Google  Earth  is  available  in  the  book  by 
Brown  (5). 

The  map  panel  in  figure  1  shows  the  current  view  (an  orthographic  projection  when  viewing 
from  infinity  on  the  positive  z  axis).  At  the  top-left  comer  of  this  map  panel  is  zoom  and 
navigation  controls.  Instead  of  using  defined  increments,  the  user  may  chose  to  click  and  drag 
the  map  in  any  direction.  At  the  other  upper  comer  are  view  controls  to  switch  between  map  and 
satellite;  hybrid  is  a  combination  of  both  map  and  satellite. 

Partial  results  for  a  figure  1  query  are  shown  in  figure  5.  Note  that  a  chronological  sort  had  been 
specified,  and  that  keywords  are  in  bold-face  type  within  the  snippet.  Also  recall  that  an  entire 
article  is  retrieved  by  simply  clicking  on  the  hypertext. 


3.  Google  AJAX  Searching  for  RTNA 


A  key  for  an  AJAX  search  using  the  Google  API  can  be  obtained  at  http://code.google.com/apis 
/ajaxsearch/signup.html.  The  key  value  is  necessary  for  a  Google  AJAX  search  when  the 
<script>  node  in  the  XHTML  document  is  defined;  this  same  attribute  value  is  valid  for  Google 
Maps  access  (see  appendix  B,  which  provides  the  XHTML  for  RTNA).  The  result  is  a  dynamic, 
highly-responsive  application  that  runs  on  your  desktop  but  with  all  the  advantages  of  being 
connected  to  the  Internet. 
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Figure  5.  Sample  results  for  a  “Al-Sadr”  OR  “Mahdi  Army”  news  search  at  a  given  time. 

There  are  many  situations  where  one  benefits  from  Ajax  effects:  e.g.,  DOM  access  of  browser 
using  the  JavaScript  interface  to  the  XHTML.  But  the  actual  asynchronous  communication  with 
a  server  is  accomplished  by: 

1 .  creation  of  the  request  object  for  communicating  with  a  server, 

2.  telling  the  Web  browser  which  JavaScript  function  to  run  when  the  request  object  ready 
state  is  4  (see  reference  3  for  a  complete  description  of  ready  states  and  status  codes), 

3.  connecting  to  Web  server  for  communication,  and 

4.  the  actual  request  to  connect  to  Web  server. 
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If  these  steps  are  satisfied,  then  the  JavaScript  request  object  can  communicate  with  the  Web 
server  asynchronously.  All  five  papers  of  an  Ajax  tutorial  by  McLaughlin  ( 6)  can  be  found  at 
this  URL. 


Note  that  the  previous  four  steps  are  taken  care  for  us  by  the  Google  AJAX  search  API.  The 
intent  here  is  to  make  the  user  aware  of  what  is  happening  behind  the  scenes  in  a  search,  and  to 
assist  in  future  additions  using  an  Ajax  design.  The  XHTML  in  appendix  B  includes  static 
JavaScript  for  determining  the  request  object  of  a  browser:  a  Microsoft  ActiveXObject  object  for 
Internet  Explorer  or  an  XMLHttpRequest  object  for  a  Mozilla-based  browser. 


4.  Content  Extraction 


The  intention  of  RTNA  is  to  provide  the  text  content  of  a  news  report  as  data.  A  news  report 
discovered  using  the  search  functionality  of  RTNA  is  typically  embedded  in  a  document  written 
in  HTML.  As  a  result,  a  mechanism  is  needed  to  find  and  extract  the  news  report  from  this 
document.  The  content  extraction  software  developed  for  RTNA  uses  a  HTML  DOM  parser  to 
locate  and  extract  text  content  from  HTML  documents. 

An  HTML  DOM  parser  transforms  an  HTML  document  into  a  tree  structure  with  nodes 
representing  tags  (element  node)  and  text  contained  in  the  tags  (text  node).  The  content- 
extraction  software  developed  for  RTNA  uses  the  CyberNeko  HTML  Parser 
(http://people.apache.org/~andyc/neko/doc/html/)  to  create  the  document  tree.  The  content- 
extraction  software  recursively  traverses  the  document  tree  to  gather  the  text  from  text  nodes. 
The  gathered  text  is  returned  as  the  content  of  the  document.  The  text  gathering  is  governed  by 
one  heuristic:  an  element  node  that  is  unlikely  to  contain  any  part  of  the  news  report  is  removed 
from  the  document  tree.  To  illustrate,  the  element  node  representing  a  <script>  tag  is  always 
removed  from  the  document  tree.  The  text  nodes  of  this  element  are  likely  to  contain  unwanted 
JavaScript  content,  instead  of  news  report  content.  Additional  heuristics  to  govern  content 
extraction  are  the  subject  of  future  research. 


5.  Conclusion  and  Future  Considerations 


Currently,  ARL’s  RTNA  gathers  news  for  a  user-specified  topic  using  three  of  the  six  Google 
services  available.  A  Google  News  search  typically  results  in  the  very  latest  from  some  10,584 
sources  around  the  world.  On  the  other  hand  a  Google  Web  search  requires  actual  specification 
of  the  URL  for  the  news  source,  which  also  means  more  control  of  a  search  by  selecting  a 
particular  source;  this  is  accomplished  by  using  the  setSiteRestriction()  method  of  a 
GwebSearchQ  object  for  a  URL. 
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A  Google  search  within  the  XHTML  document  makes  use  of  a  request/response  model  typical  of 
Ajax  technology.  Searches  are  done  asynchronously  from  the  client  browser.  For  example,  the 
response  may  be  a  Java  Server  Page,  which  is  typically  an  HTML  document,  generated  by  a 
servlet  running  on  the  server.  The  result  is  a  Web  2.0  application  with  a  very  large  set  of 
resources  but  runs  like  a  desktop  application. 

Now  that  this  initial  version  is  stable,  additional  capabilities  are  being  considered.  One  should 
be  fully  aware  of  the  objectivity  in  the  story.  For  example,  a  story  from  a  primary  wire  service  is 
typically  repeated  to  the  local  level;  perhaps  a  social  network  analysis  (SNA)  of  an  individual 
story  would  be  instructive  in  seeing  if/how  the  story  has  evolved  (use  of  the  JavaScript  Date 
object  within  the  XHTML  will  assist  in  this).  Also  note  that  SNA  has  been  done  for  a  particular 
news  media  (7)  but  the  approach  will  be  considered  for  inclusion.  The  other  Google  services 
(Blogs,  Video,  and  Book)  will  also  be  further  examined  for  inclusion  as  well.  Lastly,  we  plan  on 
investigating  a  scaleable  vector  graphics,  which  is  just  2-D  XML,  display  of  public  opinion  on  a 
particular  topic  with  time  from  a  <script>  in  our  XHTML;  for  example,  something  similar  to,  or 
including,  the  spatial  analysis  of  news  as  done  at  the  State  University  of  New  York  (5). 
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Appendix  A.  The  RTNA  XHTML  1.0  Document 


The  extensible  hypertext  markup  language  (XHTML),  including  the  internal  cascading  style 
sheet  (CSS)  and  JavaScript,  for  ARL’s  RTNA  are  now  included.  The  external  CSS  can  be  found 
at  the  Google  site  http://www.google.com/uds/css/gsearch.css.  Also  both  a  Google  AJAX  search 
API  and  Maps  key  must  be  defined;  both  are  freely  available  after  registering  at  ttp://code. 
google.com/apis/ajaxsearch/signup.html  and  http://www.google.com/maps/api_signup, 
respectively. 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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<?xml  version=nl «  0  ”  encoding=,IUTF-8rt?> 

< ! DOCTYPE  html  PUBLIC  " -//W3C//DTD  XHTML  1.0  Strict//EN" 

" http : / / www . w3  *  or g/TR/xhtml 1 / DTD/xhtml 1 -strict  *  dtd  "  > 


<html  lang=nen-CAn  xml :  lang-"en-CA"  xmlns^-* http :  /  /www .w3  .  org/ 199 9 /xhtml "> 

<head> 

< !  —  ARL  Real-Time  News  Analysis .  --> 

<  J  --  — > 

< ! --  by  Andrew  M.  Neiderer,  21  February  2007.  — > 

<!--  —  > 

<  1  —  Note  -  — > 

<» —  (1)  callback  code  was  borrowed  and  modified  from  Google  Web  site  — > 

< l --  http: //code .google. com/apis/a jaxsearch/documentation/fSearchControlCallbaeks,  — > 
<1 —  (2)  Google  Map  code  originally  by  Doug  Henderson  — > 

<1 —  http : / / www 3  *  telus , net /DougHender  son/ •  — > 


<meta  http-equiv="content-type"  content- "text /html ;  char set-lSO-8 859-1 " /> 
<meta  http-equiv=" window-target "  content ="_topn /> 

<title> 

ARL  Real-Time  News  Analysis 
</title> 

< ! —  external  and  internal  CSS  — > 

<link  href =" http: //www. google . com/uds /css/gsearch . css* 
type=" text /css"  rel=" stylesheet " f> 

<style  type=" text/css "> 
body  *f  table  *, 
body  { 

font-family :  Arial  t  Sans-serif  ; 
font -size:  1 3px ; 

title  { 

text-align  :  center 

} 

img  f 

align  :  middle 

> 

hi  { 

font-size  :  18px; 
font -weight  :  bold; 

background-color  :  rgb(23G, 248, 221 ) ; 
border-top  ;  Ipx  solid  rgb( 128, 198, 90) ; 
text-align  :  center ; 
margin-bottom  :  lOpx; 
padding-bottom  :  4px; 
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color  :  #676767; 
text -align  :  center 

> 

hi  .tagline, 

hi  a  .tagline  | 
font-size  :  13px; 
font-weight  :  normal; 
color  :  #676767; 
text-decoration  i  underline; 
cursor  :  pointer; 

> 

td  { 

vertical-align  :  top; 

} 

td . searchControl  { 
padding-left  :  25px; 
width  :  700px; 

} 

td . map  { 

width:  55Gpx; 

#mapDiv  { 

border  :  Ipx  solid  #979797; 
width  :  100%; 
height  :  400px; 

) 

. gsc-keeper  { 
display  ;  none; 

} 

.gsc- local Re suit  .gsc-keeper  { 
display  :  block; 

j 

< [ --  over-ride  rule  for  gsearch.css;  change  width  of  — > 

<!«*  search  box:  2/7/07 ,  Andrew  M.  Neiderer.  — > 

.gsc-control  { 
width  :  600px; 

} 

table .gsc-search-box  { 
width  :  600px; 

> 

</style> 

< ! —  external  and  internal  JavaScript  which  includes  support  of  Ajax  request/response  — > 
<!--  modeling.  Remember  that  static  JavaScript,  ie  definitions  outside  a  — > 

<!--  function,  is  run  automatically  when  the  Web  browser  is  loading.  --> 

<! —  Google  AJAX  search  API  key  — > 
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< script  src=H https //www .google . com/uds/api?f ile^uds  »  js&amp; v=l .0&amp; k ey = A BQ I AA  AA  Z 1 KO  s I BC  s T w_a B  Lk  4 1 aNOxQ 5 JOK S YN 5 S a  wK  W 
type= *  text / j ava scr ipt " > 

</scr ipt> 

<1—  Google  Maps  API  key  — > 

< scr ipt  src="http : / /maps . google . com/maps  ?file=api& amp; v*2 & amp; keycap! tamp; v=2& amp; key =ABQIAAAAZlK0slBCsTw_aBLk4taNOxQ 
type="text/ javascript "> 

</script> 

<script  sre“"rotl3 . js" 

t  ype= "text / j  avascr ipt " > 

</script> 

<  scr ipt  t  ype= " text / j  avascr ipt  " > 

//<  l [CDATA [ 

var  date  =  new  Date { ) ; 
document .  write  (.date .  toString  ( )  }  ; 

function  googleDate (date ) 

{ 

var  month  ~  ['January',  'February', 'March'/ 'April',  'May', 'June', 'July', 

' August ' , ' Sept  ember ' ,  ' October ' , ' November ' ,  ' December ' ] ; 

return  month [date *getMonth() 3  +  '  '  + 
date ,getDate O  +  ' ,  '  +  date . getFullYear (> ; 

I  l  I  \  l  I 

var  searchstring  -  "{'al-Sadr'  OR  'AI-Sadr'  OR  'Mahdi  Army')  AND  February  AND  16  AND  2007"; 

//  var  searchstring  -  "{'aL~Sadr'  OR  'Al-Sadr'  OR  'Mahdi  Army')  AND  ' "  +  googleDate (date)  +M'«; 

//  Ajax 

var  request  =  null;  //  request  object 

var  requestURL  -  //  request  URL 

get  Upda  t  edB  oa  r d  Sa 1 e  s - a  j  ax . php ; 
var  response  =  null;  //  response  from  server 

var  lastXMLresponse  -  null; 

var  success  =  false; 

//  create  request  object  statically  for  talking  to  Web  server 
try  { 

//  Mozilla-based  browsers 
request  -  new  XMLHttpRequest ( ) ; 

success  -  true; 

} 

catch  (microsoft)  { 
if  Microsoft 


var  httplds  =  new  Array { ' MSXML2 .  XMLHTTP  *5.0 

' MS XML 2 . XMLHTTP .4.0 
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'MSXML2 . XML HTTP . 3 . 0  ' , 

'MSXML2 • XMLHTTP  * , 

'Microsoft  *  XMLHTTP ' ) ; 

for  (  var  i  -  0;  i  <  httplds . length  &&  ! success;  i++  )  { 

try  { 

request  *=  new  Ac tiveXObject  (httplds  [i]  )  ; 

if  (  request  !=  null  ) 
success  =  true; 

} 

catch  (e)  { 

alert {"no  IE  request  object!"); 

} 

} 

) 

if  (  request  --  null  ) 

alert ("Error  creating  request  object!"); 

//  RSS  formatter 

function  formatR5Sdata(divname, response) 

{ 

var  html  -  nn; 

| 

var  docElt  =  response .documentElement; 

if  if  this  does  not  work  in  IE,  the  content-type 
//  in  the  header  was  likely  not  set  to  "text/xml"  * 

var  items  -  docElt . getE lement  sByTagName ( '  item ' ) ; 

for  (  var  i  =  0;  i  <  items . length ;  i++  )  { 

var  title  =  items [ i] . getElementsByTagName ( 'title ')[  0 ] ; 

var  link  =  items [i] , get Element sByTagName { 'link ' ) [0 ] ; 

html  +=  "<b>  <a  href=',‘  +  link .firstChild .data  +  + 

title . firstChild .data  +  "</a>  </b>  <br>"; 

var  cbDetails  =  document . getElementBy Id ( "cbDetails ") ; 

if  {  cbDetails *  checked  )  { 

var  desc  -  items  [i J. getElementsByTagName { 'description ')[ 0 ] ; 

html  +=  " <f ont  size- + 

desc , firstChild. data  + 

"</font>"; 

} 

} 
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var  targetDiv  =  document .getElementBy Id (divname)  ; 
target  Div  .  inner  HTML  =  html; 


//  dynamic  Ajax  RSS  news  feed  reader.  Every  Window  object  (see  above  comment 
//  in  httpRequest ( ) )  has  a  document  property,  which  represents  the  HTML 
//  document  displayed  in  the  window  (p.  199  of  "JavaScript1’  by  Flanagan)  . 

function  getRSSfeedt) 

i 

//  get  selected  RSS  feed: 

var  IblFeeds  =  document .getElementBy Id( "IblFeeds" ) ; 

if  (  IblFeeds -value  l-  null  )  { 

httpRequest ("GET" , IblFeeds -value,  true) ; 

1 


//  event  handler  for  updating  Web  pages  with  response  from  Web  server; 
//  uses  DOM  Document  object  of  Web  browser. 

function  handleResponse ( } 

if  (  request .readyState  “  4  )  \ 
if  (  request . status  =*  200  )  f 

//  get  the  response  from  the  server 

var  customer Address  a  request . responseText ; 

//  update  the  HTML  Web  <form> 

document , getElementByld ( "address "). value  -  customerAddress ; 

> 

1 

} 

//  initialise  request  object  that  has  already  been  created 

function  initRequest (requestType, requestURL, syncOrAsync) 

{ 

try  ( 

//  tell  browser  the  function  to  run  when  request  object 
//  ready  state  changes, 

request * onreadystatechange  »  handleResponse; 

//  connect  to  Web  server  and  communicate 
request . open (requestType, requestURL, syncOrAsync)  ; 

//  actual  request  to  connect  to  Web  server 
//  (server  needs  no  data) . 
request . send (null ) ; 
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i 

catch  (err)  { 

alert ("The  application  cannot  connect  to  server!"); 

} 


/*  the  request  object  coinmunicates  with  the  Web  server. 

Parameters : 

requestType  -  GET  or  POST 

requestURL  -  URL  of  server  program;  note  that  if  no  domain 

name  given,  then  request  goes  to  same  Web  server 
syncOrAsync  -  synchronous  (false)  or  asynchronous  (true)  request 

V 

function  httpRequest  (requestType, requestURL, syncOrAsync) 

i 

ff  initialize  request  object 

initRequest (requestType, requestURL, syncOrAsync) ; 


// 

function  createMarker ( latlng, html ) 

html  =  '<div  style= "white -space : nowrap; ">'  +  html  +  '</d iv>'; 

var  marker  =  new  GMarker ( latlng ) ; 

GEvent . addListener (marker, "click " ,  function ( ) 

GLog. write ( 'enter  marker  click  handler'); 

GLog.writeHtml (html) ; 

marker > open Inf oWindowHtml (html) ; 

GLog. write ( 'exit  marker  click  handler'}; 

} ) ; 

return  marker; 

) 

// 

function  initMap (container } 

{ 

var  zoomEvent; 

GLog  *  write  ( '  enter  initMap  O'); 

if  (  type of (GMap2 )  ! =  " undefined"  }  ( 

//  Maps  API  version  >=  2,36 
map  =  new  GMap2 (container ) ; 


Fri  Feb  23  08:03:10  2007 


rt  naA  jaxCBandSSandGM .  html 


Page  7 


zoomEvent  -  ' zoomend ' 

} 

else  { 

//  Maps  API  version  <=  2,35 
map  =  new  GMap2 (container) ; 

zoomEvent  =  'zoom' 

} 

map  *  addControl ( new  GLargeMapControl ( ) ) ; 

map  *  addControl ( new  GMapTypeControl ( } ) ; 

map  *  setCenter ( new  GLatLng (33, 21 30 ,  43.4620) , 13) ; 

//  map. setCenter (new  GLatLng(37, 4419,  “122. 1419) ,  13)  ; 

GEvent . addListener  (map, 'moveend', 
function  ( } 

GLog . write ( ' moveend :  '  +  map . getCenter ( ) . toUrlValue { )  +  *  zoom:  '  +  map . get Zoom (),' blue ' ) 

}); 

GEvent .addListener  (map, zoomEvent, 
function (a, b) 
i 

GLog, write (zoomEvent  +  from  '  +  a  +  '  to  '  +  b, 'blue') 


GEvent , addListener ( map, * maptype changed ' , 
function ( ) 

( 

GLog . write ( 'maptype  changed :  '  +  map . getCur r entMapType ( } . getName ( } ,  ' blue ' ) 

}> ; 

var  latlng  =  new  GLatLng(33 . 2130, 43 . 4620) ; 

var  marker  =  createMarker (latlng, '33 . 2130  N  lat  and  43.4620  W  long'); 

//  var  latlng  =  new  GLatLng ( 37 . 4419 ,  -122 . 1419 )  ; 

//  var  marker  **  createMarker (latlng, '37. 4419  N  lat  and  122.1419  E  long'); 

//  var  marker  =  createMarker ( latlng, 'Welcome  to  Version  2<br>of  the  Google  Maps  API'); 

map. addOver lay (marker) ; 

GLog .  write  ( ' ex i t  initMapO  '  )  ; 


// 

function  initPage ( ) 

{ 

GLog , write ( 'enter  initPage ( )  ' } ; 

GLog . wr iteUr 1 (window. location . href ) ; 

GLog. writeHtml ( 'This  HTML  contains  <strong>strong</strong>r  <b>bold</b>,  <strike>strike</strike>,  <i>italic</i>, 


if  (  ! GBrowser IsCompatible ( )  )  { 
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alert ("Your  browser  may  not  be  compatible  with  the  Google  Maps  system. \nPlease  visit  http : //maps  *  google . com  for 

} 

else  { 

var  mapDiv  —  document . getElementByld ( "mapDiv" ) ; 
initMap (mapDiv) ; 

} 

GLog. write ( 'Here  is  the  URL  to  G_DEFAULT_ICON  image'); 

GLog . writeUrl ( G_DEFAULT_I CON. image ) ; 

GLog .writeHtml ( "This  is  a  test  <b>HTML</b>  formated  messagecbr  />in  two  lines."); 

GLog . write (' exit  initPage ( ) ' ) ; 

If 

function  appletvalue ( ) 

{ 

document .myForm . q . value  =  document . myApplet , getHello ()  ; 
return  true ; 

>  ■ 

// 

function  onLoad ( ) 

{ 

initPage ( ) ; 

1  !  '  !  i  ■ 

app  =  new  _app ( ) ; 

} 

II 

function  _app ( ) 

{ 

//  this .myMap  =  null; 

//  this .markerList  =  new  Array (); 

/ /  a  map 

//  if  (  GBrowserlsCompatible ( )  )  { 

/ /  this .myMap  *  new  GMap2 (document . getElementByld ( "mapDiv" ) ) ; 

/ /  this .myMap . setCenter (new  GLatLng (37 .3861, “122 . 083) ,  14) ; 

//  } 

/ /  this ■ myMap . addControl ( new  GSmallMapContr ol ( ) ) ; 

//  create  search  control,  options 

var  searchControl  -  new  GSearchControl { ) ; 
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// 


var  options  =  new  GsearcherQptions { ) ; 

options  .  setExpandMode  (GSearchControl  *  EXFAND_MODE_OPEN)  ; 
//  full  set  of  Google  services  {2/21/07) 


var  webSearch  =  new  GwebSearch  O; 
var  newsSearch  =  new  GnewsSearch ( ) ; 
var  localSearch  =  new  GlocalSearch () ; 
var  videoSearch  «  new  GvideoSearch () ; 
var  blogSeareh  =  new  GblogSearch { ) ; 
var  bookSearch  =  new  GbookSearch { ) ; 


//  need  var  for  each  site  in  a  Google  Web  search; 
//  US  sites  and  foreign  sites 


var  webs ear chi  - 
var  webSearch 2  — 
var  webSearch 3  — 
var  webSearch4  ~ 
var  webSearch 5  = 
var  webSearch 6  - 
var  webSearch 7  = 
var  webSearch 8  - 
var  webSearch 9  = 


new  GwebSearch { ) ; 
new  GwebSearch { ) ; 
new  GwebSearch () ; 
new  GwebSearch ( ) ; 
new  GwebSearch O i 
new  GwebSearch ( ) ; 
new  GwebSearch ( ) ; 
new  GwebSearch {) ; 
new  GwebSearch (); 

t 


I 


webSearchl . setUserDef inedLabel ( "Web (CNN) " ) ; 
webSearchl . setSiteRestriction ( "www. cnn  .com" ) ; 


webSear ch2 . setUserDef inedLabel ("WebEUSA  Today)") ; 
webSear ch2 . setSiteRestriction { "www . usatoday . com" ) ; 

webSear ch3 . setUserDef inedLabel { "Web (CBS) " ) ; 
webSear ch3 . setSiteRestriction { "www . cbsnews . com" ) ; 


webSearch4 . setUserDef inedLabel { "Web {ABC ) " ) ; 
webSearch4 . setSiteRestriction { "abcnews . go . com" ) ; 

webSear ch5 .setSiteRestriction ( "www. cnn.com" ) ; 
webSear ch6  *  setSiteRestr ict i on (n www « cnn . com " ) ; 

webSearch7. setUserDef inedLabel ("Web (BBC) " ) ; 

webSearch?. setSiteRestriction { "news  *bfoc*co.uk/2/hi/middle_east" ) 


webSear chB . setUserDef inedLabel { "Web (Reuters ) " )  ; 
webSearehS. setSiteRestriction { "today .reuters , com/news* ) ; 

webSear ch9. setUserDef inedLabel {"Web (KUNA) *) ; 

webSear ch9. setSiteRestriction ( "http: //www . kuna . net ,kw") ; 

//  make  use  of  Google  Web  and  Google  News  searchers 
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searchControl .addSearcher (webSearchl) ; 
searchControl , addSearcher ( webSearch2 ) ; 
searchControl . addSearcher ( webSearch3 )  ; 
searchControI . addSear char ( webSaarch4 ) ; 

//  searchControl .addSearcher (webSearch5) ; 

if  searchControl . addSearcher (webSearchG ) ; 

searchControl *  addSearcher ( webSearchl  } ; 
searchControl .addSearcher [webSearchS ) ; 
searchControl *  addSearcher ( webSearchS) ; 

searchControl . addSearcher  (newsSearch, options > ; 

f  f  for  Google  Maps  gadget 
searchControl . addSearcher { locaiSearch) ; 

if  add  Google  Video  searcher  to  controller 
//  searchControl . addSearcher {videoSearch )  ; 

//  add  Google  Blog  searcher  to  controller 
/ f  searchControl *  addSearcher ( blogSearch) ; 

//  add  Google  Book  searcher  to  controller 
/ /  searchControl . addSearcher (bookSearch) ; 

fi  tell  the  searcher  to  draw  itself  and  where  to  attach 
searchControl . draw  (document . getElementBy Id ( " searchControlDiv" ) ) ; 

if  search  control  callbacks 

searchControl . setSearchCompleteCallback (this, onSearchComplete) ; 
searchControl *  setSearchStartingCallback ( this, onSearchStar ting) ; 
searchControl . setOnKeepCall back (this, onKeep) ; 

ff  execute  an  inital  search 
searchControl , execute (searchstring) ; 

} 

ff 

function  onSearchComplete ( searchControl, searcher) 

{ 

ff  if  we  have  local  search  results,  put  them  on  the  map 

if  {  searcher . results  &&  searcher .results - length  >  0  )  { 

alert (searcher . results . length) ; 

for  (  var  1=0;  i  <  searcher .results. length;  i++  )  { 
var  result  =  searcher , results [i]  ; 


ff  Google  Dews  service 
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if  (  result ,GsearchResultClass  ==  Gnews Search , RESULT_CLASS  ) 
alert ("Google  News  URL  ~  +  result. url); 

//  Google  Web  service 

if  (  result . GsearchRe suite lass  «  GwebSearch * RESULT_CLASS  > 
alert ( "Google  Web  URL  =  "  +  result, url}; 

//  Google  Local  service 

if  (  result . GsearchResultClass  ==  G  local  Sear  oh  -  RESULT_CLASS  )  ( 
alert ("Google  Local  URL  =  *  +  result. url); 

var  markerObject  =  new  Object C); 

markerObject . result  «  result; 

markerObject . latLng  =  new  GLatLng (parseFloat (result * lat ) , parseFloat (result , log ) ) ; 

mar kerOb ject . gmarker  -  new  GMarker (markerObject , latLng) ; 

var  clickHandler  =  met hod_closure (this, onMarkerClick, [markerObject ]) ; 

GEvent .bind (markerObject .gmarker, "click", this, clickHandler] ; 

this .marker List , push (markerObject ) ; 

this , myMap , addOver lay (markerObject .gmarker } ; 

result ...  markerObject _  =  markerObject; 


//  Google  Video  service 

if  {  result .GsearchResultClass  -=  GvideoSearch . RESULT_CLASS  ) 
alert ( "Google  Video  URL  —  "  +  result, url); 

this .onMarker Click (this  * marker List [0] ) ; 

> 

// 

function  onSearchStarting ( searchControl, searcher, query) 

( 

//  close  the  info  window  and  clear  markers 
this. myMap. closeInfoWindow{ )  ; 

for  (  var  1=0;  i  c  this . markerList , length;  i++  )  ( 
var  markerObject  =  this .markerList [i] ; 
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this  . myMap . r emo veQver lay {mar ker Ob j ect « gmar ker  ) ; 

} 

this .marker List  =  new  ArrayO; 

} 

// 

function  onKeep (result) 

{ 

if  (  result . _ jnarkerObj ect _  )  { 

markerObject  =  result. _ markerObject, _ ; 

this .onMarkerClick (markerObject } ; 

\ 

) 

// 

function  onMarkerClick (markerObject ) 

( 

this. myMap. closeInfoWindow( ) ; 

var  htmlNode  -  markerObject . result , html . cloneNode (true) ; 
markerOb  ject  *  gmarker . openlnf oWindow (htmlNode ) ; 

\ 

// 

function  method_closure (object, method, opt=argArray ) 

{ 

return  function {) 

( 

return  met hod. apply (object, opt_argArray ) ; 

} 

//]]> 

</$cript> 

</head> 

<body  onload«*onLoad{ ) "> 

<img  src="rtna.bmp"  width=H 8501*  hspace»"20 M  vspace="  4QV> 
<hl> 

using  the  Google  AJAX  Search  API 
</hl> 

<table> 

<tr> 

<td  class=',searchControln> 
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<div  id=" searchControlDivw> 

Loading, , , 

</div> 

</td> 

</tr> 

<tr> 

<td  clas5—"mapw> 

<div  id=wmapDivw> 

Loading, . , 

<■““  Google  Map  gadget  — > 

</div> 

</td> 

T  </tr> 

<tr> 

<td> 

&nbsp; 

</td> 

</tr> 

<tr> 

<td> 

<a  href-whttp : / /www . lat-long , com/ "> 
city,  state(USA) 

</a> 

</td> 

<td> 

or 

</td> 

<td> 

<a  hr e  f = " ht tp : / / www . bat  chgeocode , com/ lookup/ * > 
street  address,  city,  state {USA) 

</a> 

</td> 

</tr> 

</table> 

</body> 

</html> 


Appendix  B.  Content  Extraction  Software 

The  following  Java  code  is  executed  for  a  particular  URL.  Ideally  it  should,  and  will,  be  a 
<script>  in  the  RTNA  XHTML  document. 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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package  mil « army .  arl .  rtna; 

/* 

*  author:  John  Richardson  { jrichardson@arl . army , mil ) 

* 

*  This  product  contains  software  developed  by  Andy  Clark. 

*  http :  /  /people  *  apache  ♦  or g/ ~a  ndy c/neko/doc /html / 

*  CyberNeko  HTML  Parser 
*/ 


to 

-P^ 


import  org . cyber neko .html .parsers . DOMParser; 
import  org .v3c*dom. Node; 


public  class  HTMLDOMPar  ser  { 

public  static  String  elementPilter  {Node  node) 

{ 

String  article  -  new  StringOf 
Node  child  =  node . getFirstChild {)  ; 

ij  *  |  p  ii-  p 

String  node Name; 

1  Node  tmp; 


I 

l  •  i 

l  i 


while  (  child  !=  null  )  { 

nodeName  =  child. getNodeName () ; 
tmp  =  dhild .  getNextSiblirig  ( )  ; 

if  f  nodeName .  compareTo  { ’’Itext "  )  i=-  0  ) 
article  +=  { child. get Text Content {)) ; 
else  if  {  nodeName . compareTo ( " #comment n )  ==  0 
nodeName  .  compareTo  {  "SCRIPT "  )  *=  0 
nodeName  .  compareTo  (  "NOSCRIPT" )  ==  0 
nodeName . compareTo ( "OBJECT " )  =—  0 
nodeName . compareTo { "PARAM" )  ==  0 
n odeN  ame . compa  r eTo ( " H  E A  D  * }  - -  0 
nodeName . compareTo { "TITLE" )  -*=  0 
nodeName . compareTo ( " META" )  1=  0 
nodeName . compareTo ( "BASE" )  =“  0 
nodeName . compareTo ( " IMG" )  ==  0 
nodeName . compareTo { "MAP" )  *«  0 
nodeName . compareTo f "AREA" }  --  0 
nodeName . compareTo { "U1 " )  -=  0 
nodeName . compareTo [ "OL " )  0 

nodeName . compareTo ( "LI " )  0 

nodeName .compareTo (" DL" )  ==  0 
nodeName . compareTo ( "DT" )  0 

nodeName . compareTo ( "  DD" )  ==  0 
nodeName . compareTo ( "FORM" )  —  0 
nodeName . compareTo { " INPUT" )  “  0 
n  o  d  eN  ame . compa  r  eTo {"TEX I ARE A  " )  ==  0 
nodeName . compareTo ( "BUTTON" )  =  0 


i  s 


i 


//  text  content 
//  comment  tags 
//  programming  tags 


//  meta  info  tags 
//  image  tags 


//  lists  tags 


//  input  tags 
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nodeName  *  compareTo ( " SELECT" )  --  0 
nodeName *  compareTo ( "OFTGROUP" )  ==  0 
n ode Name  - c ompa r  e  To { " OP T I ON " }  -=  0 
nodeName  *  compareTo  (  "  LABEL 11 )  ==  0 
nodeName .  compareTo  (  "FIELDSET" )  0 

nodeName , compareTo ( "LEGEND" )  ==  0  ) 
node . removeChi Id {child) ; 
else 

article  +=  elementFilter (child) ; 
child  =  tmp; 

> 


return  article; 

) 


public  static  Node  getHTMLDQMDocument 
throws  Exception 

\ 

DOMParser  parser  =  new  DOMParser { ) ; 

. J  i  'it 

parser  .parse (target ) ; 


(String  target) 


Node  doc  —  parser « get Document ()  ; 

I  return  'doc;  1  \  I  !  *  t  !  t 

>i  H  I  M  I  M  t  M  I  M  I 

t  i  '  *  l  *  !  '  ■  1  i  ■  ■  |  i  -i 

public  static  void  main {String  args [] } 
throws  Exception 

{ 

//  provide  URL  as  argument 

System  .  out  *  print  In  (HTMLDOMParser ,  elementFilter  (HTMLDOMParser  •  get  HTML  DOM  Document  {args  [  0]  )  )  )  ; 

) 


■I- 

! 


Intentionally  left  blank. 


26 


List  of  Symbols,  Abbreviations,  and  Acronyms 

Ajax 

asynchronous  JavaScript  and  XML 

API 

application  programming  interface 

ARL 

U.S.  Army  Research  Laboratory 

CSS 

cascading  style  sheet 

DOM 

document  object  model 

GUI 

graphical  user  interface 

HTML 

hypertext  markup  language 

RTNA 

Real-Time  News  Analysis 

SNA 

Social  Network  Analysis 

XHTML 

extensible  hypertext  markup  language 

XML 

extensible  markup  language 
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